Extracting Various Types of Informative Web Content via Fuzzy Sequential Pattern Mining
نویسندگان
چکیده
In this paper, we present a web content extraction method to extract different types of informative web content for news web pages. A fuzzy sequential pattern mining method, namely FSP, is developed to gradually discover fuzzy sequential patterns for various types of informative web content. To avoid the situation that the usage of HTML tags may be changed with the development of web technology, fuzzy sequential patterns are mined using a stable feature, in particular, the number of tokens in each line of source code. We have conducted extensive experiments and good clustering properties for the discovered sequential patterns are observed. Experimental results demonstrate that the FSP method is effective compared with state-of-the-art content extraction methods. Besides main articles of web pages, it can also find other types interesting web content such as article recommendations and article titles effectively.
منابع مشابه
High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences
Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...
متن کاملPerformance Evaluation on State of the Art Sequential Pattern Mining Algorithms
8 ABSTRACT Data mining refers to extracting or mining knowledge from large amounts of data. Among the various data mining tasks sequential pattern mining is one of the most important tasks. It has broad applications in several domains such as the analysis of customer purchase patterns, web access patterns, seismologic data, and weather observations. Sequential pattern mining consists of mining ...
متن کاملDistributed Sequential Pattern Mining: A Survey and Future Scope
Distributed sequential pattern mining is the data mining method to discover sequential patterns from large sequential database on distributed environment. It is used in many wide applications including web mining, customer shopping record, biomedical analysis, scientific research, etc. A large research has been done on sequential pattern mining on various distributed environments like Grid, Had...
متن کاملA Study of Text Mining Methods, Applications,and Techniques
Data mining is used to extract useful information from the large amount of data. It is used to implement and solve different types of research problems. The research related areas in data mining are text mining, web mining, image mining, sequential pattern mining, spatial mining, medical mining, multimedia mining, structure mining and graph mining. Text mining also referred to text of data mini...
متن کاملSequential Rule Mining in M-Learning Domain
Use of Sequential Rule mining is becoming an important tool in m-learning domain to convert the data into information. It is commonly used in a wide series of profiling practices, such as marketing, fraud detection and scientific discovery. Sequential Rule mining is the specialized technique using which we can extract some patterns from given data. These rules can be used to uncover patterns in...
متن کامل